-
-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure restart clears taskgroups et al #6944
Conversation
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ±0 15 suites ±0 6h 40m 20s ⏱️ + 13m 19s For more details on these failures, see this check. Results for commit 35db050. ± Comparison against base commit c15a10e. ♻️ This comment has been updated with latest results. |
distributed/tests/test_scheduler.py
Outdated
@@ -615,11 +615,19 @@ async def test_ready_remove_worker(s, a, b): | |||
|
|||
@gen_cluster(client=True, Worker=Nanny, timeout=60) | |||
async def test_restart(c, s, a, b): | |||
from distributed.scheduler import TaskState | |||
|
|||
before = TaskState._instances |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're creating a new reference to the same WeakSet. The assertion below is always true because it's the same object.
Also, I suspect this may be prone to race conditions with late/delayed cleanup from previous tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right of course. I don't even know if this is a sane test, to be honest. We've been looking into TaskState garbage collection before and this is a bit tricky. I'll probably just remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
Very interested to see if this has an effect on coiled/benchmarks#253 |
This does indeed seem to fix coiled/benchmarks#253. The Average memory for that test module: Duration: Interestingly, it doesn't seem to have an enormous effect on other tests. |
Restart keeps TaskGroups, TaskPrefixes, etc. around.
This can have a non-trivial impact on scheduling, e.g.
distributed/distributed/scheduler.py
Line 2648 in c15a10e
distributed/distributed/scheduler.py
Lines 1802 to 1805 in c15a10e
distributed/distributed/scheduler.py
Lines 2015 to 2022 in c15a10e
and there are possibly more collections where this matters. In a perfect world, all mutable state would be encapsulated in a dedicated class of which we could create a new instance in case of a restart but we're very far away from this. I think these are the most important collections to clear.